Flock: Crawling the Twitter Social Graph
نویسندگان
چکیده
In this work, we’ve developed a tool capable of crawling the Twitter social graph. In order to collect the social graph, we developed a Twitter application leveraging 30 user accounts and the Twitter REST API v1.1. To date, Flock has collected the profiles of over 48 million Twitter users connected by over 158 million links. Flock continues to actively crawl over 57 million valid Twitter user IDs that is has so far discovered. We store the collected data into an open source NoSQL database (MongoDB) for further social network analysis.
منابع مشابه
Twitter Data Collection: Crawling Users, Neighbors and Their Communication for Personal Attribute Prediction in Social Media
متن کامل
Adaptive Identification of Hashtags for Real-Time Event Data Collection
the widespread use of Microblogging services, such as Twitter, makes them a valuable tool to correlate people’s personal opinions about popular public events. Researchers have capitalized on such tools to detect and monitor real world events based upon this public, social, perspective. Most Twitter event analysis approaches rely on events tweets collected through a set of pre-defined keywords. ...
متن کاملCommunity Detection on Evolving Graphs
Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedl...
متن کاملA Faceted Crawler for the Twitter Service
Researchers, nowadays, have at their disposal valuable data from social networking applications, of which Twitter and Facebook are the most prominent examples. To retrieve this content, the Twitter service provides 2 distinct Application Programming Interfaces (APIs): a probe-based and a streaming one, each of which imposes different limitations on the data collection process. In this paper, we...
متن کاملLoklak - A Distributed Crawler and Data Harvester for Overcoming Rate Limits
Modern social networks have become sources for vast quantities of data. Having access to such big data can be very useful for various researchers and data scientists. In this paper we describe Loklak, an open source distributed peer to peer crawler and scraper for supporting such research on platforms like Twitter, Weibo and other social networks. Social networks such as Twitter and Weibo pose ...
متن کامل